Multi-Gate Attention Network for Image Captioning
نویسندگان
چکیده
Self-attention mechanism, which has been successfully applied to current encoder-decoder framework of image captioning, is used enhance the feature representation in encoder and capture most relevant information for language decoder. However, existing methods will assign attention weights all candidate vectors, implicitly hypothesizes that vectors are relevant. Moreover, self-attention mechanisms ignore intra-object distribution, only consider inter-object relationships. In this paper, we propose a Multi-Gate Attention (MGA) block, expands traditional by equipping with additional Weight Gate (AWG) module Self-Gated (SG) module. The former constrains be assigned contributive objects. latter adopted distribution eliminate irrelevant object vector. Furthermore, captioning apply original transformer designed natural processing task, refine features directly. Therefore, pre-layernorm simplify architecture make it more efficient enhancement. By integrating MGA block into AWG decoder, present novel Network (MGAN). experiments on MS COCO dataset indicate MGAN outperforms state-of-the-art, further other combined blocks demonstrate generalizability our proposal.
منابع مشابه
Image Captioning with Attention
In the past few years, neural networks have fueled dramatic advances in image classi cation. Emboldened, researchers are looking for more challenging applications for computer vision and arti cial intelligence systems. They seek not only to assign numerical labels to input data, but to describe the world in human terms. Image and video captioning is among the most popular applications in this t...
متن کاملImage Captioning using Visual Attention
This project aims at generating captions for images using neural language models. There has been a substantial increase in number of proposed models for image captioning task since neural language models and convolutional neural networks(CNN) became popular. Our project has its base on one of such works, which uses a variant of Recurrent neural network coupled with a CNN. We intend to enhance t...
متن کاملText-Guided Attention Model for Image Captioning
Visual attention plays an important role to understand images and demonstrates its effectiveness in generating natural language descriptions of images. On the other hand, recent studies show that language associated with an image can steer visual attention in the scene during our cognitive process. Inspired by this, we introduce a text-guided attention model for image captioning, which learns t...
متن کاملAttention Correctness in Neural Image Captioning
Attention Map Visualization We visualize the attention maps of both the implicit attention model and our supervised attention model on the Flickr30k test set. As mentioned in the paper, 909 noun phrases are aligned for the implicit model and 901 for the supervised model. 635 of these alignments are common for both, and 595 of them have corresponding bounding boxes. Here we present a subset due ...
متن کاملVideo Captioning with Multi-Faceted Attention
Recently, video captioning has been attracting an increasing amount of interest, due to its potential for improving accessibility and information retrieval. While existing methods rely on different kinds of visual features and model structures, they do not fully exploit relevant semantic information. We present an extensible approach to jointly leverage several sorts of visual features and sema...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2021
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2021.3067607